MedLex+: An Integrated Corpus-Lexicon Medical Workbench for Swedish

نویسندگان

  • Dimitrios Kokkinakis
  • Maria Toporowska Gronostaj
چکیده

This paper reports on the work carried out developing MedLex+, a medical corpuslexicon workbench for Swedish. This project, which is still under active development, has been going on for some years now within the Department of Swedish language at Göteborg University. At the moment, the workbench incorporates: an annotated collection of medical texts-including 20 million tokens and 45,000 documents, a number of language processing software programs, including tools for collocation extraction, compound segmentation and thesaurus-based semantic annotation, and a lexical database of medical terms-containing 5,000 medical entries. MedLex+ is a multifunctional lexical resource due to a structural design and content which can be easily queried. The medical workbench is intended to support lexicographers compiling lexicons and also lexicon users more or less initiated in the medical domain. MedLex+ can also assist researchers working on either lexical semantics or natural language processing (NLP) applications with focus on medical language. The linguistically and semantically annotated medical texts in combination with a set of smart queries turn the corpora into a rich repository of semasiological and onomasiological knowledge about medical terms and their linguistic, lexical and pragmatic properties. These properties are recorded in the lexical database with a cognitive profile. The MedLex+ workbench seems to offer a constructive help in many different lexical tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collection, Encoding and Linguistic Processing of a Swedish Medical Corpus - The MEDLEX Experience

Corpora annotated with structural and linguistic characteristics play a major role in nearly every area of language processing. During recent years a number of corpora and large data sets became known and available to research even in specialized fields such as medicine, but still however, targeted predominantly for the English language. This paper provides a description of the collection, enco...

متن کامل

A Semantically Annotated Swedish Medical Corpus

With the information overload in the life sciences there is an increasing need for annotated corpora, particularly with biological and biomedical entities, which is the driving force for data-driven language processing applications and the empirical approach to language study. Inspired by the work in the GENIA Corpus, which is one of the very few of such corpora, extensively used in the biomedi...

متن کامل

Developing Resources for Swedish Bio-Medical Text Mining

Collection and annotation of corpora in specialized fields, such as medicine, and particularly for lesser-spoken languages, than for instance English, is an important enterprise for the continuous development and growth of language technology research, for resource development and for the implementation of practical applications for these languages. In this paper, we describe our ongoing effort...

متن کامل

Internet as Corpus Automatic Construction of a Swedish News Corpus

This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...

متن کامل

Internet as Corpus-Automatic Construction of a Swedish News Corpus

This paper describes the automatic building of a corpus of short Swedish news texts from the Internet, its application and possible future use. The corpus is aimed at research on Information Retrieval, Information Extraction, Named Entity Recognition and Multi Text Summarization. The corpus has been constructed by using an Internet agent, the so called newsAgent, downloading Swedish news text f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008